Model Selection

Proximal Policy Optimization

# Proximal Policy Optimization

Stable Vicuna 13b Delta

StableVicuna-13B is a fine-tuned version of the Vicuna-13B v0 model, enhanced through Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO) on various dialogue and instruction datasets.

Large Language Model

Transformers English

Ppo LunarLander V2

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase